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Abstract — Unions of graph Fourier multipliers are an impor- 
tant class of linear operators for processing signals defined on 
graphs. We present a novel method to efficiently distribute the 
application of these operators to the high-dimensional signals 
collected by sensor networks. The proposed method features 
approximations of the graph Fourier multipliers by shifted 
Chebyshev polynomials, whose recurrence relations make them 
readily amenable to distributed computation. We demonstrate 
how the proposed method can be used in a distributed denoising 
task, and show that the communication requirements of the 
method scale gracefully with the size of the network. 

Index Terms — Chebyshev polynomial approximation, denois- 
ing, distributed optimization, regularization, signal processing on 
graphs, spectral graph theory, wireless sensor networks 

I. Introduction 

Wireless sensor networks are now prevalent in applications 
such as environmental monitoring, target tracking, surveil- 
lance, medical diagnostics, and manufacturing process flow. 
The sensor nodes are often deployed en masse to collectively 
achieve tasks such as estimation, detection, classification, and 
localization. While such networks have the ability to collect 
large amounts of data in a short time, they also face a number 
of resource constraints. First, they are energy constrained, 
as they are often expected to operate for long periods of 
time without human intervention, despite being powered by 
batteries or energy harvesting. Second, they may have limited 
communication range and capacity due to the need to save 
energy. Third, they may have limited on-board processing 
capabilities. Therefore, it is critical to develop distributed 
algorithms for in-network data processing that help balance the 
trade-offs between performance, communication bandwidth, 
and computational complexity. 

Due to the limited communication range of wireless sensor 
nodes, each sensor node in a large network is likely to 
communicate with only a small number of other nodes in 
the network. To model the communication patterns, we can 
write down a graph with each vertex corresponding to a sensor 
node and each edge corresponding to a pair of nodes that 
communicate. Moreover, because the communication graph is 
a function of the distances between nodes, it often captures 
spatial correlations between sensors' observations as well. 
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That is, if two sensors are close enough to communicate, 
their observations are more likely to be correlated. We can 
further specify these spatial correlations by adding weights 
to the edges of the graph, with higher weights associated to 
edges connecting sensors with closely correlated observations. 
For example, it is common to construct the graph with a 
thresholded Gaussian kernel weighting function based on the 
physical distance between nodes, where the weight of edge e 
connecting nodes i and j that are a distance d(i, j) apart is 

[ otherwise 

for some parameters a and k. 

In this paper, we consider signals collected by a sensor 
network whose nodes can only send messages to their local 
neighbors (i.e., they cannot communicate directly with a 
central entity). While much of the literature on distributed 
signal processing (see, e.g., |[T)-|U ar, d references therein) 
focuses on coming to an agreement on simple features of the 
observed signal (e.g., consensus averaging, parameter estima- 
tion), we are more interested in processing the full function in 
a distributed manner, with each node having its own objective. 
Some example tasks under this umbrella include: 

• Distributed denoising - In a sensor network of N sensors, 
a noisy TV-dimensional signal is observed, with each 
component of the signal corresponding to the observation 
at one sensor location. Using the prior knowledge that the 
denoised signal should be smooth or piecewise smooth 
with respect to the underlying weighted graph structure, 
the sensors' task is to denoise each of their components 
of the signal by iteratively passing messages to their local 
neighbors and performing computations. 

• Distributed semi-supervised learning / binary classifica- 
tion - A binary label (-1 or 1) is associated with each 
sensor node; however, only a small number of nodes 
in the network have knowledge of their labels. The 
cooperative task is for each node to learn its label by 
iteratively passing messages to its local neighbors and 
performing computations. 

These and similar tasks have been considered in centralized 
settings in the relatively young field of signal processing on 
graphs. For example, 0-EI consider general regularization 



frameworks on weighted graphs; |8]-[ 10] present graph-based 
semi-supervised learning methods; and 1 1 1 1-[ 14 1 consider reg- 
ularization and filtering on weighted graphs for image and 
mesh processing. In distributed settings, lfl5ll considers denois- 
ing via wavelet processing and [16| presents a denoising algo- 
rithm that projects the measured signal onto a low-dimensional 
subspace spanned by smooth functions. References lfT71 - lfT9ll 
consider different distributed regression problems. 

Our main contributions in this paper are i) to show that a key 
component of many distributed signal processing tasks is the 
application of linear operators that are unions of graph Fourier 
multipliers; and ii) to present a novel method to efficiently dis- 
tribute the application of the graph Fourier multiplier operators 
to the high-dimensional signals collected by sensor networks. 

To elaborate a bit, graph Fourier multiplier operators are 
the graph analog of filter banks, one of the most commonly 
used tools in digital signal processing. Multiplying a signal 
on the graph by one of these matrices is analogous to 
reshaping the signal's frequencies by multiplying it by a 
filter in the Fourier domain in classical signal processing. 
The crux of our novel distributed computational method is 
to approximate each graph Fourier multiplier by a truncated 
Chebyshev polynomial expansion. In a centralized setting, lEUll 
shows that the truncated Chebyshev polynomial expansion 
efficiently approximates the application of a spectral graph 
wavelet transform, which is a specific example of a union 
of graph Fourier multipliers. In this paper, we extend the 
Chebyshev polynomial approximation method to the general 
class of unions of graph Fourier multiplier operators, and show 
how the recurrence properties of the Chebyshev polynomials 
also enable distributed application of these operators. The 
communication requirements for distributed computation using 
this method scale gracefully with the number of sensors in the 
network (and, accordingly, the size of the signals). 

The remainder of the paper is as follows. In the next section, 
we provide some background from spectral graph theory. In 



Section III we introduce graph Fourier multiplier operators 



and show how they can be efficiently approximated with 
shifted Chebyshev polynomials in a centralized setting. We 
then discuss the distributed computation of quantities involving 



these operators in Section IV and provide some application 
examples in Section [V] Section VI concludes the paper. 



II. Spectral Graph Theory 

Before proceeding, we introduce some basic notations and 
definitions from spectral graph theory ETl . We model the 
sensor network with an undirected, weighted graph G = 
{E, V, w}, which consists of a set of vertices V, a set of 
edges E, and a weight function w : E — > R + that assigns 
a non-negative weight to each edge. We assume the number 
of sensors in the network, TV = |V|, is finite, and the graph is 
connected. The adjacency (or weight) matrix A for a weighted 
graph G is the N x N matrix with entries A m , n , where 



4 — 



w(e), if e £ E connects vertices m and n 



The degree of each vertex is the sum of the weights of all the 
edges incident to it. We define the degree matrix D to have 
diagonal elements equal to the degrees, and zeros elsewhere. 
The non-normalized graph Laplacian is defined as C := D— A. 
For any / e R N on the vertices of the graph, C satisfies 

OC/)(m)= 4 )B -[/H-/(4 

where m ~ n indicates vertices m and n are connected. 

As the graph Laplacian £ is a real symmetric matrix, it has 
a complete set of orthonormal eigenvectors. We denote these 
by xt f° r i = 0, . ■ • ,N — 1, with associated real, non-negative 
eigenvalues \i satisfying C\i = ^iXi- Zero appears as an 
eigenvalue with multiplicity equal to the number of connected 
components of the graph lETl . Without loss of generality, we 
assume the eigenvalues of the Laplacian of the connected 
graph G to be ordered as 



= A < A : < A 2 ... < A 



JV-l 



A r 



Just as the classical Fourier transform is the expansion of 
a function / in terms of the eigenfunctions of the Laplace 
operator 



= = / }{x)e 



dx. 



the graph Fourier transform f of any function / € 

on the vertices of G is the expansion of / in terms of the 

eigenfunctions of the graph Laplacian. It is defined by 



N 



(2) 



where we adopt the convention that the inner product be 
conjugate-linear in the first argument. The inverse graph 
Fourier transform is given by 



JV-l 



f(n) = £ f(i)Xi(n). 



(3) 



0. 



otherwise 



III. Chebyshev Polynomial Approximation of 
Graph Fourier Multipliers 

In this section, we introduce graph Fourier multiplier opera- 
tors, unions of graph Fourier multiplier operators, and a com- 
putationally efficient approximation to unions of graph Fourier 
multiplier operators based on shifted Chebyshev polynomials. 
All methods discussed here are for a centralized setting, and 
we extend them to a distributed setting in Section \V7\ 

A. Graph Fourier Multiplier Operators 

For a function / defined on the real line, a Fourier multi- 
plier operator or filter ^ reshapes the function's frequencies 
through multiplication in the Fourier domain: 

tyf(uj) = g(w)f(u>), for every frequency w. 



Equivalently, denoting the Fourier and inverse Fourier trans- 
forms by T and J-^ 1 , we have 



*f(x)=F- 1 (g(u)F(f)(u))(x) 



(4) 



1 

2^ 



We can extend this straightforwardly to functions defined on 
the vertices of a graph (and in fact to any group with a Fourier 
transform) by replacing the Fourier transform and its inverse 
in Q with the graph Fourier transform and its inverse, defined 
in |2]l and (|3j. Namely, a graph Fourier multiplier operator is 
a linear operator ^ : R N — > R N that can be written as 



*/(n) = J- 1 ( ff (A,)^(/)(i))(n) 
9(X e )f(i)xe(n). 



JV-l 

E 

£=0 



(5) 



We refer to g(-) as the multiplier. A high-level intuition behind 
(|5]l is as follows. The eigenvectors corresponding to the lowest 
eigenvalues of the graph Laplacian are the "smoothest" in the 
sense that \xt{m) — xe(n)\ is small for neighboring vertices 
m and n. At the extreme is \o, which is a constant vector 
(Xo( m ) — Xo( n ) f° r a ll m an d n )- The inverse graph Fourier 
transform ([3]) provides a representation of a signal / as a 
superposition of the orthonormal set of eigenvectors of the 
graph Laplacian. The effect of the graph Fourier multiplier 
operator if? is to modify the contribution of each eigenvector. 
For example, applying a multiplier <?(•) that is 1 for all \i 
below some threshold, and for all Xg above the threshold is 
equivalent to projecting the signal onto the eigenvectors of the 
graph Laplacian associated with the lowest eigenvalues. This 
is analogous to low-pass filtering in the continuous domain. 
Section [V] contains further intuition about and examples of 
graph Fourier multiplier operators. For more properties of the 
graph Laplacian eigenvectors, see l22l . 

B. Unions of Graph Fourier Multiplier Operators 

In order for our distributed computation method of the next 
section to be applicable to a wider range of applications, 
we can generalize slightly from graph Fourier multipliers 
to unions of graph Fourier multiplier operators. A union 
of graph Fourier multiplier operators is a linear operator 
$ : R N -> MP N (77 G {1,2,...}) whose application to a 
function / e can be written as (see also Figure [TJ 

$/=[* i; * 3 ;...;* n ]/ 

= . . . ; . . . ; (*„/)i; . . . ; (%/)n\ 

= [(*/)i;(3/) a ;... ;(*/)„*], 

where for every j, $?j : R N — > R N is a graph Fourier 
multiplier operator with multiplier <?j(-), and 

N-l 

(*/)0--i)JV+n = E 9j(Xi)Mxe(n), (6) 
for j e{l,2,...,r)}, ne{l,2,...,N}. 



*f = nN 
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Fig. 1. Application of a union of graph Fourier multiplier operators. 



C. The Chebyshev Polynomial Approximation 

Exactly computing <£>/ requires explicit computation of the 
entire set of eigenvectors and eigenvalues of C, which becomes 
computationally challenging as the size of the network, N, in- 
creases, even in a centralized setting. As discussed in detail in 
ll20l Section 6], a computationally efficient approximation $/ 
of $/ can be computed by approximating each multiplier gj (•) 
by a truncated series of shifted Chebyshev polynomials. Doing 
so circumvents the need to compute the full set of eigenvectors 
and eigenvalues of C. We summarize this approach below. 

For y £ [—1,1], the Chebyshev polynomials 
{ T k{y)} k =a,i.2,... ^ generated by 



T k (y) 



if k = 
if k = 1 
if k > 2 



2yT k _x(y) - T k _ 2 (y), 
These Chebyshev polynomials form an orthogonal basis for 
, dy ). So every function h on [—1,1] that is 



L 2 [-1,1], 



square integrable with respect to the measure dy/yjl — y 2 
can be represented as h(y) = ^bo + Y^k=i^kT k {y), where 
{b k } k =o,i,... is a sequence of Chebyshev coefficients that 
depends on h(-). For a detailed overview of Chebyshev 
polynomials, including the above definitions and properties, 
see 1231-1251. 

By shifting the domain of the Chebyshev polynomials to 
[0, A max ] via the transformation x = ^f^iy + 1), we can 
represent each multiplier as 

1 



9j{x) 



o c ^o + 2^ Cj,kT k (x), for all x e [0, A 

max ], (7) 



fc=i 



where 



T k (x) 



x — a 



and 



Cj.k ■= - / cos 

JO 



(k9) &-(a(coB(0) + l)) 



dO. 



(8) 



For k > 2, the shifted Chebyshev polynomials satisfy 

T k {x) = -{x- u)T k -i(x) - T k - 2 (x)- 
a 

Thus, for any / £ ~R N , we have 

T k (C)f = 2 (£ - a/) (7Vi(£)/) - T k - 2 {C)f, (9) 
a 

where T fe (£) G M ArxJV and the n th element of T k (£)f is 
given by 



JV-l 



(T fc (£)/) n := ]T T fc (A,)/(£)^(n). 



(10) 



Now, to approximate the operator <£>, we can approximate 
each multiplier gj(-) by the first M terms in its Chebyshev 
polynomial expansion |7]). Then, for every j £ {1, 2, . . . , rf\ 
and n £ {1,2,..., N}, we have 



(*')„-. 



(i-l)JV+r. 



1 M _ \ 

2 C j,of ' + 22c jtk T k (C)f \ 
fe=i / , 



2 

AT-l 



(11) 



e-- 
-1 

E 



<?=o 

AT-1 



M 



fc=i 



/(*)*<(») 



£=0 L 
— ( < l > /)(j-l)A'+n ■ 

To recap, we propose to compute $/ by first computing the 
Chebyshev coefficients {cj^jj^i^,....^; fe=i,2,...,M according 
to ([8]), and then computing the sum in ( fTTj i. The computational 
benefit of the Chebyshev polynomial approximation arises 
in ( fTT) from the fact the vector T k (C)f can be computed 
recursively from Tfc_i(£)/ and Tfc_ 2 (£)/ according to 
The computational cost of doing so is dominated by the 
matrix-vector multiplication of the graph Laplacian £, which 
is proportional to the number of edges, \E\ l20l . Therefore, 
if the underlying communication graph is sparse (i.e., \E\ 
scales linearly with the network size N), it is far more 
computationally efficient to compute <tf than $/. Finally, 
we note that in practice, setting the Chebyshev approximation 
order M to around 20 results in approximating $ very 
closely in all of the applications we have examined. 

IV. Distributed Computation 

In the previous section, we showed that the Chebyshev 
polynomial approximation to a union of graph Fourier mul- 
tipliers provides computational efficiency gains, even in a 
centralized computation setting. In this section, we discuss the 
second benefit of the Chebyshev polynomial approximation: it 
is easily distributable. 



A. Distributed Computation of $/ 

We consider the following scenario. There is a network of N 
nodes, and each node n begins with the following knowledge: 

• f(n), the n th component of the signal / 

• The identity of its neighbors, and the weights of the graph 
edges connecting itself to each of its neighbors 

• The first M Chebyshev coefficients, Cj yk , for j £ 
{1,2,... ,77} and k £ {0, 1, 2, . . . , M}. These can either 
be computed centrally according to ([8]) and then trans- 
mitted throughout the network, or each node can begin 
with knowledge of the multipliers, {g 3 ■(•)}j=i,2,...,7j, an d 
precompute the Chebyshev coefficients according to (|8j 

• An upper bound on A max , the largest eigenvalue of the 
graph Laplacian. This bound need not be tight, so we 
can precompute a bound such as A max < max{d(m) + 
d(n); m ~ n}, where d(n) is the degree of node n E6l 

The task is for each network node n to compute 



&-l)W+™Jj=l,2,...,u 



(12) 



by iteratively exchanging messages with its local neighbors in 
the network and performing some computations. 

As a result of ( fTTj ), for node n to compute the desired 
sequence in (L2]>, it suffices to learn {{T k (C)f) n } 

Note that (Ti (£)/)„ = (i(£-aJ)/)„ and' £„.„', 
for all nodes m that are not neighbors of node n. Thus, to 
compute (Tx(£)/) , sensor node n just needs to receive 
f(m) from all neighbors m. So once all nodes send their 
component of the signal to their neighbors, they are able 
to compute their respective components of Ti(£)/. In the 
next step, each node n sends the newly computed quantity 
(Ti(£)/J to all of its neighbors, enabling the distributed 
computation of T%{£,)f according to |9]). The iterative process 
of computation and local communication continues for M 
rounds until each node n has computed the required sequence 
{{Tk(C)f) J k=12 M . In all, 2M\E\ messages of length 
1 are required for every node n to compute its sequence of 
coefficients in ( fl2] > in a distributed fashion. This distributed 
computation process is summarized in Algorithm 1. 

An important point to emphasize again is that although the 
operator $ and its approximation $ are defined through the 
eigenvectors of the graph Laplacian, the Chebyshev polyno- 
mial approximation helps the sensor nodes apply the operator 
to the signal without explicitly computing (individually or 
collectively) the eigenvalues or eigenvectors of the Laplacian, 
other than the upper bound on its spectrum. Rather, they 
initially communicate their component of the signal to their 
neighbors, and then communicate simple weighted combi- 
nations of the messages received in the previous stage in 
subsequent iterations. In this way, information about each 
component of the signal / diffuses through the network with- 
out direct communication between non-neighboring nodes. 

B. Distributed Computation o/$*a 

The application of the adjoint $* of the Chebyshev poly- 
nomial approximate operator $ can also be computed in a 



Algorithm 1 Distributed Computation of $/ 

Inputs at node n: f n , C n . rn Vm, {c k j} j : 2 ;;: fe=0)1) _. 

and A max 

Outputs at node n: < I $/ 



There fore, with each node n starting with f{n) as in Section 



M ' 



(j-l)jV+n 



J=l,2,...,») 



Set (To (/:)/)„ = /„ 

Transmit /„ to all neighbors Af n := {m : C n ,m < 0} 
Receive f m from all neighbors Af n 
Compute and store 



m J n 



fa 



for k 



,M do 



Transmit (T k -i(£)f) to all neighbors Af n 
Receive {T k ~i{C)f) m from all neighbors J\f n 
Compute and store 

(T k (£)f) n = Yl l^, m (T k . 1 (£)f) m 



end for 

for j e {1,2, 
Output 



,??} do 



0'-l)iy+n 



1 M 

~ c j,o/n + E c J> fc (T k (C)f) n 



fc=l 



12: end for 



distributed manner. Let 

a = [ai; a 2 ; . . . ;a v ] € M* 7 ", 
where aj g M . Then it is straightforward to show that 

\ " (\ M \ 

f * a J =E 2 c ^ + E c ^( £ K • (13) 

" 3=1 V k=l /„ 

We assume each node n starts with knowledge of a,j (n) for all 
j G {1,2,..., j]}. For each j e {1,2,..., rf\, the distributed 
computation of the corresponding term on the right-hand side 
of (jT3j is done in an analogous manner to the distributed 
computation of $/ discussed above. Since this has to be done 
for each j, 2M\E\ messages, each a vector of length 77, are 
required for every node n to compute ^$*a^ . 

C. Distributed Computation 0/$*$/ 

Using the property of Chebyshev polynomials that 

T k {x)T k r[x) = - [T k+k ,(x) +I] fc _ fc /|(a:)] , 



we can write (see |20| for a similar calculation) 



(***/)„ = (\ 



2M \ 

^d f + J2d k T k (C)f 
fe=i j 



IV-A the nodes can compute in a distributed manner 

using 4Af|i5| messages of length 1, with each node n finishing 
with knowledge of 

V. Application Examples 

In this section, we provide more detailed explanations 
of how the Chebyshev polynomial approximation of graph 
Fourier multipliers can be used in the context of specific 
distributed signal processing tasks. 

A. Distributed Smoothing 

Perhaps the simplest example application is distributed 
smoothing with the heat kernel as the graph Fourier multiplier. 
One way to smooth a signal y € is to compute H t y, where, 



for a fixed t, (H t y)(n) := J2t=o 



N ~! n -t\. 



y{i)xi{n). H t clearly 



satisfies our definition of a graph Fourier multiplier operator 
(with 77 = 1). In the context of a centralized image smoothing 
application, ifTJI discusses in detail the heat kernel, H t , and 
its relationship to classical Gaussian filtering. Similar to the 



example at the end of Section III-A the main idea is that the 
multiplier e~ tXi acts as a low-pass filter that attenuates the 
higher frequency (less smooth) components of y. 

Now, to perform distributed smoothing, we just need to 
compute H t y in a distributed manner according to Algorithm 
1, where H t is the shifted Chebyshev polynomial approxima- 
tion to the graph Fourier multiplier operator H t . 

B. Distributed Regularization 

Regularization is a common signal processing technique to 
solve ill-posed inverse problems using a priori information 
about a target signal to recover it accurately. Here we use 
regularization to solve the distributed denoising task discussed 
in Section [i] starting with a noisy signal y £ R N defined on 
a graph of TV sensors. The prior belief we want to enforce is 
that the target signal is smooth with respect to the underlying 
graph topology. The class of regularization terms we consider 
is f 1 C T f for r > 1, and the resulting regularization problem 
has the form 



argmin-||/- y\\\ 



fC r f. 



(14) 



To see intuitively why incorporating such a regularization term 
into the objective function encourages smooth signals (with 
r = 1 as an example), note that J^Cf = if and only if / is 
constant across all vertices, and, more generally 

fCf=\Y.Y. A ™Af(™)-f{n)]\ 

so f 1 Cf is small when the signal / has similar values at 
neighboring vertices with large weights (i.e., it is smooth). 

We now show how our novel method is useful in solving 
this distributed regularization problem. 



Proposition 1: The solution to ( |T4] > is given by Ry, where 
R is a graph Fourier multiplier operator of the form ([5j, with 
multiplier ff (A £ ) = ^ 

Proof: The objective runction in ( fT4] i is convex in /. 
Differentiating it with respect to /, any solution /* to 







(15) 



is a solution to (fT4]>| 2 | Taking the graph Fourier transform of 
( fT5) > yields 



£ r /*W + i(/*W-i/WJ =o, (16) 

We {0,1,..., AT- 1}. 

From the real, symmetric nature of C and the definition of the 
Laplacian eigenvectors (Cxe — XtXl)< we have: 

?FT*{1) = X\£ T U = {C r xd* f* = Kxlf* = >iU(Jt). (17) 
Substituting (F7\ into < fT~6| > and rearranging, we have 



r + 2A; 



We{o,i,...,/y-i}. 



(18) 



Finally, taking the inverse graph Fourier transform of ( [181 1, we 
have 



JV-l 



N-l 



e=o 



y(t) X t(n), (19) 



t + 2\\ 
Vne{l,2,...,/Y}. 



So, one way to do distributed denoising is to compute 
Ry, the Chebyshev polynomial approximation of Ry, in a 
distributed manner via Algorithm 1 . We show this now with a 
numerical example. We place 500 sensors randomly in the 
[0, 1] x [0, 1] square. We then construct a weighted graph 
according to the thresholded Gaussian kernel weighting ([T| 
with a = 0.074 and k — 0.600, so that two sensor nodes 
are connected if their physical separation is less than 0.075. 
We create a smooth 500-dimensional signal with the n th 
component given by = n x + n y ~ 1» where n x and n y are 
node rt's x and y coordinates in [0, 1] x [0, 1]. One instance of 
such a network and signal /° are shown in Figure [5J and the 
eigenvectors of the graph Laplacian are shown in Figure [3] 

Next, we corrupt each component of the signal f° with 
uncorrected additive Gaussian noise with mean zero and stan- 
dard deviation 0.5. Then we apply the graph Fourier multiplier 
operator R, the Chebyshev polynomial approximation to R 
from Proposition [T] with r = r = 1. The multiplier and its 
Chebyshev polynomial approximations are shown in Figure |4] 
and the denoised signal Ry is shown in Figure [5] We repeated 
this entire experiment 1000 times, with a new random graph 
and random noise each time, and the average mean square 
error for the denoised signals was 0.013, as compared to 0.250 
average mean square error for the noisy signals. 

'This filter g(Xi) is the graph analog of a first-order Bessel filter from 
classical signal processing of functions on the real line. 

2 In the case r = 1, the optimality equation 4 1 5| corresponds to the 
optimality equation in 1121 Section III-A] with p = 2 in that paper. 




Fig. 2. A network of 500 sensors placed randomly in the [0, 1] X [0, 1] 
square. The background colors represent the values of the smooth signal / . 



Xi 





(c) 



(d) 



Fig. 3. Some eigenvectors of the Laplacian of the graph shown in Figure 
[2] The blue bars represent positive values and the black bars negative values, 
(a) xo, the constant eigenvector associated with Ao = 0. (b) xi. the Fiedler 
vector associated with the lowest strictly positive eigenvalue, nicely separates 
the graph into two components, (c) \2 is also a smooth eigenvector, (d) \50 
is far less smooth with some large differences across neighboring nodes. 



We conclude this section by returning to the distributed 
binary classification task discussed in the introduction. In (9), 
Belkin et al. show that the regularizer f 1 ' C r f also works 
well in graph-based semi-supervised learning. One approach to 
distributed binary classification is to let y n be the labels (-1 or 
1) of those nodes who know their labels, and otherwise. Then 
the nodes compute Ry in a distributed manner via Algorithm 
1, and each node n sets it label to 1 if (Ry) n > and -1 
otherwise. We believe our approach to distributedly applying 
graph Fourier multipliers can also be used for more general 
learning problems, but we leave this for future work. 

C. Distributed Wavelet Denoising 

In this section, we consider an alternate method of dis- 
tributed denoising that may be better suited to signals that are 
piecewise smooth on the graph, but not necessarily globally 



smooth. The setup is the same as in Section V-B with a noisy 
signal y g M. N , and each sensor n observing y n . Instead of 
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- Chebyshev Polynomial Approximator 

- Chebyshev Polynomial Approximate 




29l. The initial estimate of the wavelet coefficients 



Fig. 4. The regularizing multiplier 



associated with the graph 



Fourier multiplier operator R from Proposition [T] Here, r = r = 1. 
Shifted Chebyshev polynomial approximations to the multiplier are shown 
for different values of the approximation order M. 



Original Signal 



Noise 





(b) 

Denoised Signal 




(c) 



(d) 



Fig. 5. A denoising example on the graph shown in Figure [2] using the 
regularizing multiplier shown in FigureBl (a) The original signal + — 1, 
where n x and n y are the x and y coordinates of sensor node n. (b) The 
additive Gaussian noise, (c) The noisy signal y. (d) The denoised signal Ry. 



starting with a prior that the signal is globally smooth, we start 
with a prior belief that the signal is sparse in the spectral graph 
wavelet domain ll20l . The spectral graph wavelet transform, 
W, defined in ||20l , is precisely of the form of $ in ((51. 
Namely, it is composed of one multiplier, h(-), that acts as 
a low-pass filter to stably represent the signal's low frequency 
content, and J wavelet operators, defined by gj(\e) = g{tj\f), 
where {tj}j=i,2,...,j is a set of scales and g(-) is the wavelet 
multiplier that acts as a band-pass filter. 

The most common way to incorporate a sparse prior in a 
centralized setting is to regularize via a weighted version of the 
least absolute shrinkage and selection operator (lasso) [27|, 
also called basis pursuit denoising 



argmin -\\y — W^a^ + 



where ||a||i iM := (J-i \a>i\. The optimization problem 

in (pOb can be solved for example by iterative soft thresholding 



(20) 



is arbitrary, and at each iteration of the soft thresholding 
algorithm, the update of the estimated wavelet coefficients is 
given by 



,0) 
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tW 



y-W*" 1 )])), 



1,2,...,7V(J + 1); fc=l,2, 



(21) 



where r is the step size and 6> Mir 
thresholding operator 



is the shrinkage or soft 



, if | z \< PiT 

z - sgn(z)^r , o.w. 



The iterative soft thresholding algorithm converges to a*, the 



minimizer of d20|, if r < 



|| w p 



301. The final denoised 



estimate of the signal is then given by W*a*. 

We now turn to the issue of how to implement the above al- 
gorithm in a distributed fashion by sending messages between 
neighbors in the network. One option would be to use the 
distributed lasso algorithm of [19|, which is a special case of 
the Alternating Direction Method of Multipliers ED P- 253]. 
In every iteration of that algorithm, each node transmits its 
current estimate of all the wavelet coefficients to its local 
neighbors. With a transform the size of the spectral graph 
wavelet transform, this requires 2\E\ total messages at every 
iteration, with each message being a vector of length N(J+1). 
A method where the amount of communicated information 
does not grow with N (beyond the number of edges, \E\) 
would be highly preferable. 

The Chebyshev polynomial approximation of the spectral 
graph wavelet transform allows us to accomplish this goal. Our 
approach is to approximate W by W, and use the distributed 
implementation of the approximate wavelet transform and its 
adjoint to perform iterative soft thresholding. In the first soft 
thresholding iteration, each node n must learn (Wy)^j_i- ) jf +n 
at all scales j, via Algorithm 1. These coefficients are then 
stored for future iterations. In the k th iteration, each node n 
must learn the J + 1 coefficients of WW*cS k ~ ^ centered 
at n, by sequentially applying the operators W* and W in a 
distributed manner via the methods of Sections [I V-B | and |I V- A| 
respectively. Finally, when a stopping criterion for the soft 
thresholding is satisfied, the adjoint operator W* is applied 
again in a distributed manner to the resulting coefficients a*, 
and node n's denoised estimate of its signal is (w*aJ^J . 

We now examine the communication requirements o/ this 
approach. Recall from Section IV that 2M|i?| messages 
of length 1 are required to compute Wy in a distributed 
fashion. Distributed computation of WW*a^ k ~ 1 \ the other 
term needed in the iterative thresholding update pTj ), requires 
2A/|_E| messages of length J + 1 and 2M|S| messages of 
length 1. The final application of the adjoint operator W* to 
recover the denoised signal estimates requires another 2M|£J| 
messages, each a vector of length J + l. Therefore, the Cheby- 
shev polynomial approximation to the spectral graph wavelet 
transform enables us to iteratively solve the weighted lasso 
in a distributed manner where the communication workload 



only scales with the size of the network through \E\, and is 
otherwise independent of the network dimension N. 

VI. Concluding Remarks and Future Work 

We presented a novel method to distribute a class of linear 
operators called unions of graph Fourier multiplier operators. 
The main idea is to approximate the graph Fourier multipliers 
by Chebyshev polynomials, whose recurrence relations make 
them readily amenable to distributed computation in a sensor 
network. Key takeaways from the discussion and application 
examples include: 

• A number of distributed signal processing tasks can 
be represented as distributed applications of unions of 
graph Fourier multiplier operators (and their adjoints) to 
signals on weighted graphs. Examples include distributed 
smoothing, denoising, and semi-supervised learning. 

• The graph Fourier multiplier operators are the graph ana- 
log of filter banks, as they reshape functions' frequencies 
through multiplication in the Fourier domain. 

• The amount of communication required to perform the 
distributed computations only scales with the size of the 
network through the number of edges of the communica- 
tion graph, which is usually sparse. Therefore, the method 
is well suited to large-scale sensor networks. 

Our ongoing work includes extending the scope and depth 
of our application examples. In addition to considering more 
applications and larger size networks, we plan a more thorough 
empirical comparison of the computation and communication 
requirements of the approach described in this paper to al- 
ternative distributed optimization methods. The second major 
line of ongoing work is to analyze robustness issues that arise 
in real networks. For instance, we would like to incorporate 
quantization and communication noise into the sensor network 
model, in order to see how these propagate when using the 
Chebyshev polynomial approximation approach to distributed 
signal processing tasks. It is also important to analyze the 
effects of a sensor node dropping out of the network or 
communicating nodes losing synchronicity to ensure that the 
proposed method is stable to these disturbances. 
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